Skip to content

fix: persist event-watcher state for clean validator warm restarts#339

Open
LandynDev wants to merge 3 commits into
testfrom
fix/event-watcher-state-persistence
Open

fix: persist event-watcher state for clean validator warm restarts#339
LandynDev wants to merge 3 commits into
testfrom
fix/event-watcher-state-persistence

Conversation

@LandynDev
Copy link
Copy Markdown
Collaborator

Summary

Validator restarts on lena triggered four observed pains:

  1. Log spamsync_to walks back 600 blocks from cold-start cursor; the public finney RPC drops state past ~240 blocks, so ~360 "State already discarded" warnings flood the log on every restart.
  2. Emissions to RECYCLE on the first scoring pass — while catch-up is in progress, the in-memory active_events/busy_events timelines are partial, scoring sees most miners as inactive, and the pool routes to RECYCLE_UID.
  3. Busy-state drift — miners with open swaps at restart-time look idle to scoring until the matching terminal event replays.
  4. Restart latency — cold init does N per-hotkey get_miner_active_flag RPCs + 1 get_active_swaps before the first forward step.

Fix

Persist the event-watcher's reconstructed timeline to state.db and hydrate it back on restart. Warm restart trusts the DB as source of truth and skips contract reads entirely. A persisted cursor more than one scoring window behind head wipes persistence and falls back to cold bootstrap — the chain has moved past replayable history so the contract is the only authority left.

Per-block cursor advance means a crash mid-chunk re-replays at most one block instead of an entire 50-block chunk. bootstrapped_swap_ids is persisted alongside the cursor so warm restarts keep the skip-list that prevents double-counting the cold-seeded +1 against its SwapInitiated replay.

Pruned-block exceptions during the post-restart catch-up collapse into a single INFO summary per sync_to ("360 pruned blocks skipped (blocks 4123..4482)") instead of one warning per block.

Test plan

  • pytest tests/test_event_watcher.py — 43 passed (29 new)
  • pytest tests/ — 496 passed
  • ruff format + ruff check clean

LandynDev added 3 commits May 18, 2026 21:30
Adds active_events, busy_events, event_watcher_meta (cursor) and
bootstrapped_swaps tables plus the methods that read, write, prune and
reset them. Anchor-preserving prune mirrors the existing rate_events
rule: latest row per hotkey is kept past cutoff so window-start
reconstruction stays correct after pruning.
initialize() now branches on the persisted cursor: a fresh DB still
cold-bootstraps from the contract; a cursor within one scoring window
of head hydrates the in-memory active/busy mirrors from state.db
without touching the contract; a cursor further back wipes
persistence and falls back to cold.

Transitions write through on every record_active_transition and
apply_busy_delta. Cursor advances per block at the tail of
process_block, so a crash mid-chunk re-replays at most one block.
bootstrapped_swap_ids is persisted so warm restarts preserve the
skip-list that prevents double-counting the seeded +1 against the
SwapInitiated replay.

Pruned-block exceptions during get_block_hash/get_events on a public
finney node (which keeps only ~240 blocks of state) collapse into one
INFO summary per sync_to instead of ~360 per-block warnings during
the catch-up after restart.
Adds five test classes:
- TestStateStoreEventTables: round-trip + anchor-preserving prune for
  the four new tables.
- TestEventWatcherWarmRestart: cold writes anchors, warm hydrates
  without contract reads, long outage falls back to cold.
- TestEventWatcherWriteThrough: transitions persist; cursor advances
  per block.
- TestEventWatcherLogHygiene: pruned-block error collapses into a
  single summary, unrelated exceptions still log per-block, counter
  resets between sync_to calls.
- TestSwapOutcomesIdempotency: re-applying SwapCompleted does not
  duplicate swap_outcomes rows.
@xiao-xiao-mao xiao-xiao-mao Bot added the bug Something isn't working label May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant